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M D P T A G S 7 

GGCTGCTCGCGGAGGGGCAGTGTACGCGGGGCCGCTGTAGGCTGTCCAGGG ATG GAT CCC ACC GCG GGA AGC 21 

KKEPGGGAATEEGVNRIAV P 27 

AAG AAG GAG CCT GGA GGA GGC GCG GCG ACT GAG GAG GGC GTG AAT AGG ATC GCA GTG CCA 81 

KPPSIEEFSIVKPISRGAFG 47 

AAA CCG CCC TCC ATT GAG GAA TTC AGC ATA GTG AAG CCC ATT AGC CGG GGC GCC TTC GGG 141 

KVYLGQKGGKLYAVKVVKKA 67 

AAA GTG TAT CTG GGG CAG AAA GGC GGC AAA TTG TAT GCA GTA AAG GTT GTT AAA AAA GCA 201 

DMINKNMTHQVQAERDALAL 87 

GAC ATG ATC AAC AAA AAT ATG ACT CAT CAG GTC CAA GCT GAG AGA GAT GCA CTG GCA CTA 261 

SKSPFIVHLYYSLQSANNVY 107 

AGC AAA AGC CCA TTC ATT GTC CAT TTG TAT TAT TCA CTG CAG TCT GCA AAC AAT GTC TAC 321 

LVMEYLIGGDVKSLLHIYGY 127 

TTG GTA ATG GAA TAT CTT ATT GGG GGA GAT GTC AAG TCT CTC CTA CAT ATA TAT GGT TAT 381 

FDEEMAVKYISEVALALDYL 147 

TTT GAT GAA GAG ATG GCT GTG AAA TAT ATT TCT GAA GTA GCA CTG GCT CTA GAC TAC CTT 441 

HRHGIIHR DL KPDNMLISNE 167 

CAC AGA CAT GGA ATC ATC CAC AGG GAC TTG AAA CCG GAC AAT ATG CTT ATT TCT AAT GAG 501 

GHIKLTDFGLSKVTLNRDIN 187 

GGT CAT ATT AAA CTG ACG GAT TTT GGC CTT TCA AAA GTT ACT TTG AAT AGA GAT ATT AAT 561 

MMDILTTPSMAKPRQDYSRT 207 

ATG ATG GAT ATC CTT ACA ACA CCA TCA ATG GCA AAA CCT AGA CAA GAT TAT TCA AGA ACC 621 

PGQVLSLISSLGFNTPIAEK 227 

CCA GGA CAA GTG TTA TCG CTT ATC AGC TCG TTG GGA TTT AAC ACA CCA ATT GCA GAA AAA 681 

NQDPANILSACLSETSQLSQ 247 

AAT CAA GAC CCT GCA AAC ATC CTT TCA GCC TGT CTG TCT GAA ACA TCA CAG CTT TCT CAA 741 

GLVCPMSVDQK.DTTPYSSKL 267 

GGA CTC GTA TGC CCT ATG TCT GTA GAT CAA AAG GAC ACT ACG CCT TAT TCT AGC AAA TTA 801 

LKSCLETVASNPGMPVKCLT 287 

CTA AAA TCA TGT CTT GAA ACA GTT GCC TCC AAC CCA GGA ATG CCT GTG AAG TGT CTA ACT 861 

SNLLQSRKRLATSSASSQSH 307 

TCT AAT TTA CTC CAG TCT AGG AAA AGG CTG GCC ACA TCC AGT GCC AGT AGT CAA TCC CAC 921 

TFISSVESECHSSPKWEKDC 327 

ACC TTC ATA TCC AGT GTG GAA TCA GAA TGC CAC AGC AGT CCC AAA TGG GAA AAA GAT TGC 981 

QESDEALGPTMMSWNAVEKL 347 

CAG GAA AGT GAT GAA GCA TTG GGC CCA ACA ATG ATG AGT TGG AAT GCA GTT GAA AAG TTA 1041 

CAKSANAIETKGFNKKDLEL 367 

TGC GCA AAA TCT GCA AAT GCC ATT GAG ACG AAA GGT TTC AAT AAA AAG GAT CTG GAG TTA 1101 

ALSPIHNSSALPTTGRSCVN 387 
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GCT CTT TCT CCC ATT CAT AAC AGC AGT GCC CTT CCC ACC ACT GGA CGC TCT TGT GTA AAC 1161 

LAKKCFSGEVSWEAVELDVN 
CTT GCT AAA AAA TGC TTC TCT GGG GAA GTT TCT TGG GAA GCA GTA GAA CTG GAT GTA AAT 

NINMDTDTSQLGFHQSNQWA 427 

AAT ATA AAT ATG GAC ACT GAC ACA AGT CAG TTA GGT TTC CAT CAG TCA AAT CAG TGG GCT 1281 

VDSGGISEEHLGKRSLKRNF 447 

GTG GAT TCT GGT GGG ATA TCT GAA GAG CAC CTT GGG AAA AGA AGT TTA AAA AGA AAT TTT 1341 

ELVDSSPCKKIIQNKKTCVE 467 

GAG TTG GTT GAC TCC AGT CCT TGT AAA AAA ATT ATA CAG AAT AAA AAA ACT TGT GTA GAG 1401 

YKHNEMTNCYTNQNTGLTVE 487 

TAT AAG CAT AAC GAA ATG ACA AAT TGT TAT ACA AAT CAA AAT ACA GGC TTA ACA GTT GAA 1461 

VQDLKLSVHKSQQNDCANKE 507 

GTG CAG GAC CTT AAG CTA TCA GTG CAC AAA AGT CAA CAA AAT GAC TGT GCT AAT AAG GAG 1521 

NIVNSFTDKQQTPEKLPIPM 527 

AAC ATT GTC AAT TCT TTT ACT GAT AAA CAA CAA ACA CCA GAA AAA TTA CCT ATA CCA ATG 1581 

IAKNLMCELDEDCEKNSKRD 547 

ATA GCA AAA AAC CTT ATG TGT GAA CTC GAT GAA GAC TGT GAA AAG AAT AGT AAG AGG GAC 1641 

YLSSSFLCSDDDRASKNISM 567 

TAC TTA AGT TCT AGT TTT CTA TGT TCT GAT GAT GAT AGA GCT TCT AAA AAT ATT TCT ATG 1701 

NSDSSFPGISIMESPLESQP 587 

AAC TCT GAT TCA TCT TTT CCT GGA ATT TCT ATA ATG GAA AGT CCA TTA GAA AGT CAG CCC 1761 

LDSDRSIKESSFEESNIEDP 607 

TTA GAT TCA GAT AGA AGC ATT AAA GAA TCC TCT TTT GAA GAA TCA AAT ATT GAA GAT CCA 1821 

LIVTPDCQEKTSPKGVENPA 627 

CTT ATT GTA ACA CCA GAT TGC CAA GAA AAG ACC TCA CCA AAA GGT GTC GAG AAC CCT GCT 1881 

VQESNQKMLGPPLEVLKTLA 647 

GTA CAA GAG AGT AAC CAA AAA ATG TTA GGT CCT CCT TTG GAG GTG CTG AAA ACG TTA GCC 1941 

SKRNAVAFRSFNSHINASNN 667 

TCT AAA AGA AAT GCT GTT GCT TTT CGA AGT TTT AAC AGT CAT ATT AAT GCA TCC AAT AAC 2001 

SEPSRMNMTSLDAMDISCAY 687 

TCA GAA CCA TCC AGA ATG AAC ATG ACT TCT TTA GAT GCA ATG GAT ATT TCG TGT GCC TAC 2061 

SGSYPMAITPTQKRRSCMPH 707 

AGT GGT TCA TAT CCC ATG GCT ATA ACC CCT ACT CAA AAA AGA AGA TCC TGT ATG CCA CAT 2121 

QQTPNQIKSGTPYRTPKSVR 727 

CAG CAG ACC CCA AAT CAG ATC AAG TCG GGA ACT CCA TAC CGA ACT CCG AAG AGT GTG AGA 2181 

RGVAPVDDGRILGTPDYLAP 747 

AGA GGG GTG GCC CCC GTT GAT GAT GGG CGA ATT CTA GGA ACC CCA GAC TAC CTT GCA CCT 2241 

ELLLGRAHGPAVDWWALGVC 767 

GAG CTG TTA CTA GGC AGG GCC CAT GGT CCT GCG GTA GAC TGG TGG GCA CTT GGA GTT TGC 2301 

LFEFLTGIPPFNDETPQQVF 787 

TTG TTT GAA TTT CTA ACA GGA ATT CCC CCT TTC AAT GAT GAA ACA CCA CAA CAA GTA TTC 2361 
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QNILKRDIPWPEGEEKLSDN 807 

CAG AAT ATT CTG AAA AGA GAT ATC CCT TGG CCA GAA GGT GAA GAA AAG TTA TCT GAT AAT 2421 

AQSAVEILLTIDDTKRAGMK 827 

GCT CAA AGT GCA GTA GAA ATA CTT TTA ACC ATT GAT GAT ACA AAG AGA GCT GGA ATG AAA 2481 

ELKRHPLFSDVDWENLQHQT 847 

GAG CTA AAA CGT CAT CCT CTC TTC AGT GAT GTG GAC TGG GAA AAT CTG CAG CAT CAG ACT 2541 

MPFIPQPDDETDTSYFEARN 867 

ATG CCT TTC ATC CCC CAG CCA GAT GAT GAA ACA GAT ACC TCC TAT TTT GAA GCC AGG AAT 2601 

TAQHLTVSGFSL* 880 

ACT GCT CAG CAC CTG ACC GTA TCT GGA TTT AGT CTG TAG 2640 

CACAAAAATTTTCCTTTTAGTCTAGCCTCGTGTTATAGAATGAACTTGCATAATTATATACTCCTTAATACTAGATTGA 
TCTAAGGGGGAAAGATCATTATTTAACCTAGTTCAATGTGCTTTTAATGTACGTTACAGCTTTCACAGAGTTAAAAGGC 
TGAAAGGAATATAGTCAGTAATTTATCTTAACCTCAAAACTGTATATAAATCTTCAAAGCTTTTTTCATCTATTTATTT 
TGTTTATTGCACTTTATGAAAACTGAAGCATCAATAAAATTAGAGGACACTATTGAAAAAAAAAAAAAAAAAAAA 
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GGGCCCGCGGGCCGCCTGCTGCCTCCGCCCGCGCCGGGGTCCCCAGCCGCCCCCGCTGCCGTGTCCCCTGCGGCCGGCC 

MPARIGYYEID 11 

AGCCGCGTCCCCCAGCCCCGGCCTCCCGCGGACCC ATG CCC GCC CGT ATC GGC TAC TAC GAG ATC GAC 33 

RTIGKGNFAVVKRATHLVTK 31 

CGC ACC ATC GGC AAG GGC AAC TTC GCG GTG GTC AAG CGG GCC ACG CAC CTC GTC ACC AAG 93 

AKVAIKIIDKTQLDEENLKK 51 

GCC AAG GTT GCT ATC AAG ATC ATA GAT AAG ACC CAG CTG GAT GAA GAA AAC TTG AAG AAG 153 

IFREVQIMKMLCHPHIIRLY 71 

ATT TTC CGG GAA GTT CAA ATT ATG AAG ATG CTT TGC CAC CCC CAT ATC ATC AGG CTC TAC 213 

QVMETERMIYLVTEYASGGE 91 

CAG GTT ATG GAG ACA GAA CGG ATG ATT TAT CTG GTG ACA GAA TAT GCT AGT GGA GGG GAA 273 

IFDHLVAHGRMAEKEARRKF 111 

ATA TTT GAC CAC CTG GTG GCC CAT GGT AGA ATG GCA GAA AAG GAG GCA CGT CGG AAG TTC 333 

KQIVTAVYFCHCRNIVHRDL 131 

AAA CAG ATC GTC ACA GCT GTC TAT TTT TGT CAC TGT CGG AAC ATT GTT CAT CGT GAT TTA 393 

KAENLLLDANLNIKIADFGF 151 

AAA GCT GAA AAT TTA CTT CTG GAT GCC AAT CTG AAT ATC AAA ATA GCA GAT TTT GGT TTC 453 

SNLFTPGQLLKTWCGSPPYA 171 

AGT AAC CTC TTC ACT CCT GGG CAG CTG CTG AAG ACC TGG TGT GGC AGC CCT CCC TAT GCT 513 

APELFE GKEYDGPKVDIWSL 191 

GCA CCT GAA CTC TTT GAA GGA AAA GAA TAT GAT GGG CCC AAA GTG GAC ATC TGG AGC CTT 573 

GVVLYVLVCGALPFDGSTLQ 211 

GGA GTT GTC CTC TAC GTG CTT GTG TGC GGT GCC CTG CCA TTT GAT GGA AGC ACA CTG CAG 633 

NLRARVLSGKFRIPFFMSTE 231 

AAT CTG CGG GCC CGC GTG CTG AGT GGA AAG TTC CGC ATC CCA TTT TTT ATG TCC ACA GAA 693 

CEHLIRHMLVLDPNKRLSME 251 

TGT GAG CAT TTG ATC CGC CAT ATG TTG GTG TTA GAT CCC AAT AAG CGC CTC TCC ATG GAG 753 

QICKHKWMKLGDADPNFDRL 271 

CAG ATC TGC AAG CAC AAG TGG ATG AAG CTA GGG GAC GCC GAT CCC AAC TTT GAC AGG TTA 813 

IAECQQLKEERQVDPLNEDV 291 

ATA GCT GAA TGC CAA CAA CTA AAG GAA GAA AGA CAG GTG GAC CCC CTG AAT GAG GAT GTC 873 

LLAMEDMGLDKEOTLQAEQA 311 

CTC TTG GCC ATG GAG GAC ATG GGA CTG GAC AAA GAA CAG ACA CTG CAG GCG GAG CAG GCA 933 

GTAMNISVPQVOLINPENQI 331 

GGT ACT GCT ATG AAC ATC AGC GTT CCC CAG GTG CAG CTG ATC AAC CCA GAG AAC CAA ATT 993 

VEPDGTLNLDSDEGEEPSPE 351 

GTG GAG CCG GAT GGG ACA CTG AAT TTG GAC AGT GAT GAG GGT GAA GAG CCT TCC CCT GAA 1053 

ALVRYLSMRRHTVGVADPRT 371 

GCA TTG GTG CGC TAT TTG TCA ATG AGG AGG CAC ACA GTG GGT GTG GCT GAC CCA CGC ACG 1113 
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EVMEDLQKLLPGFPGVNPQA 391 

GAA GTT ATG GAA GAT CTG CAG AAG CTC CTA CCT GGC TTT CCT GGA GTC AAC CCC CAG GCT 1173 

PFLQVAPNVNFMHNLLPMQN 411 

CCA TTC CTG CAG GTG GCC CCT AAT GTG AAC TTC ATG CAC AAC CTG TTG CCT ATG CAA AAC 1233 

LQPTGQLEYKEQSLLQPPTL 431 

TTG CAA CCA ACC GGG CAA CTT GAG TAC AAG GAG CAG TCT CTC CTA CAG CCG CCC ACG CTA 1293 

QLLNGMGPLGRRASDGGANI 451 

CAG CTG TTG AAT GGA ATG GGC CCC CTT GGC CGG AGG GCA TCA GAT GGA GGA GCC AAC ATC 1353 

QLHAQQLLKRPRGPSPLVTM 471 

CAA CTG CAT GCC CAG CAG CTG CTG AAG CGC CCA CGG GGA CCC TCT CCG CTT GTC ACC ATG 1413 

TPAVPAVTPVDEESSDGEPD 491 

ACA CCA GCA GTG CCA GCA GTT ACC CCT GTG GAC GAG GAG AGC TCA GAC GGG GAG CCA GAC 1473 

QEAVQRYLANRSKRHTLAMT 511 

CAG GAA GCT GTG CAG AGG TAC TTG GCA AAT AGG TCC AAA AGA CAT ACA CTG GCC ATG ACC 1533 

NPTAEIPPDLQRQLGQQPFR 531 

AAC CCT ACA GCT GAG ATC CCA CCG GAC CTA CAA CGG CAG CTA GGA CAG CAG CCT TTC CGT 1593 

SRVWPPHLVPDQHRSTYKDS 551 

TCC CGG GTC TGG CCT CCT CAC CTG GTA CCT GAT CAG CAT CGC TCT ACC TAC AAG GAC TCC 1653 

NTLHLPTERFSPVRRFSDGA 571 

AAC ACT CTG CAC CTC CCT ACG GAG CGT TTC TCC CCT GTG CGC CGG TTC TCA GAT GGG GCT 1713 

ASIQAFK AHLEKMGNNSSIK 591 

GCG AGC ATC CAG GCC TTC AAA GCT CAC CTG GAA AAA ATG GGC AAC AAC AGC AGC ATC AAA 1773 

QLQQECEQLQKMYGGQIDER 611 

CAG CTG CAG CAG GAG TGT GAG CAG CTG CAG AAG ATG TAC GGG GGG CAG ATT GAT GAA AGA 1833 

TLEKTQQQHMLYQQEQHHQI 631 

ACC CTG GAG AAG ACC CAG CAG CAG CAT ATG TTA TAC CAG CAG GAG CAG CAC CAT CAA ATT 1893 

LQQQIQDSICPPQPSPPLQA 651 

CTC CAG CAA CAA ATT CAA GAC TCT ATC TGT CCT CCT CAG CCA TCT CCA CCT CTT CAG GCT 1953 

ACENQPALLTHQLQRLRIQP 671 

GCA TGT GAA AAT CAG CCA GCC CTC CTT ACC CAT CAG CTC CAG AGG TTA AGG ATT CAG CCT 2013 

SSPPPNHPNNHLFRQPSNSP 691 

TCA AGC CCA CCC CCC AAC CAC CCC AAC AAC CAT CTC TTC AGG CAG CCC AGT AAT AGT CCT 2073 

PPMSSAMIQPHGAASSSQFQ 711 

CCC CCC ATG AGC AGT GCC ATG ATC CAG CCT CAC GGG GCT GCA TCT TCT TCC CAG TTT CAA 2133 

GLPSRSAIFQQQPENCSSPP 731 

GGC TTA CCT TCC CGC AGT GCA ATC TTT CAG CAG CAA CCT GAG AAC TGT TCC TCT CCT CCC 2193 

NVALTCLGMQQPAQSQQVTI 751 

AAC GTG GCA CTA ACC TGC TTG GGT ATG CAG CAG CCT GCT CAG TCA CAG CAG GTC ACC ATC 2253 

QVQEPVDMLSNMPGTAAGSS 771 

CAA GTC CAA GAG CCT GTT GAC ATG CTC AGC AAC ATG CCA GGC ACA GCT GCA GGC TCC AGT 2313 
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GRGISISPSAGQMQMQHRTN 791 

GGG CGC GGC ATC TCC ATC AGC CCC AGT GCT GGT CAG ATG CAG ATG CAG CAC CGT ACC AAC 2373 

LMATLSYGHRPLSKQLSADS 811 

CTG ATG GCC ACC CTC AGC TAT GGG CAC CGT CCC TTG TCC AAG CAG CTG AGT GCT GAC AGT 2433 

AEAHSAHQQPPHYTTSALQQ 831 

GCA GAG GCT CAC AGT GCA CAT CAG CAG CCG CCA CAC TAT ACC ACG TCG GCA CTA CAG CAG 2493 

ALLSPTPPDYTRHQQVPHIL 851 

GCC CTG CTG TCT CCC ACG CCG CCA GAC TAT ACA AGA CAC CAG CAG GTA CCC CAC ATC CTT 2553 

QGLLSPRHSLTGHSDIRLPP 871 

CAA GGA CTG CTT TCT CCC CGG CAT TCG CTC ACC GGC CAC TCG GAC ATC CGG CTG CCC CCA 2613 

TEFAQLIKRQQQQRQQQQQQ 891 

ACA GAG TTT GCA CAG CTC ATT AAA AGG CAG CAG CAA CAA CGG CAG CAG CAG CAG CAA CAG 2673 

QQQQEYQELFRHMNQGDAGS 911 

CAG CAA CAG CAA GAA TAC CAG GAA CTG TTC AGG CAC ATG AAC CAA GGG GAT GCG GGG AGT 2733 

LAPSLGGQSMTERQALSYQN 931 

CTG GCT CCC AGC CTT GGG GGA CAG AGC ATG ACA GAG CGC CAG GCT TTA TCT TAT CAA AAT 2793 

ADSYHHHTSPQHLLQIRAQE 951 

GCT GAC TCT TAT CAC CAT CAC ACC AGC CCC CAG CAT CTG CTA CAA ATC AGG GCA CAA GAA 2853 

CVSQASSPTPPHGYAHQPAL 971 

TGT GTC TCA CAG GCT TCC TCA CCC ACC CCG CCC CAC GGG TAT GCT CAC CAG CCG GCA CTG 2913 

MHSESMEEDCSCEGAKDGFQ991 

ATG CAT TCA GAG AGC ATG GAG GAG GAC TGC TCG TGT GAG GGG GCC AAG GAT GGC TTC CAA 2973 

DSKSSSTLTKGCHDSPLLLS 1011 

GAC AGT AAG AGT TCA AGT ACA TTG ACC AAA GGT TGC CAT GAC AGC CCT CTG CTC TTG AGT 3033 

TGGPGDPESLLGTVSHAQEL 1031 

ACC GGT GGA CCT GGG GAC CCT GAA TCT TTG CTA GGA ACT GTG AGT CAT GCC CAA GAA TTG 3093 

GIHPYGHQPTAAFSKNKVPS 1051 

GGG ATA CAT CCC TAT GGT CAT CAG CCA ACT GCT GCA TTC AGT AAA AAT AAG GTG CCC AGC 3153 

REPVIGNCMDRSSPGQAVEL 1071 

AGA GAG CCT GTC ATA GGG AAC TGC ATG GAT AGA AGT TCT CCA GGA CAA GCA GTG GAG CTG 3213 

PDHNGLGYPARPSVHEHHRP 1091 

CCG GAT CAC AAT GGG CTC GGG TAC CCA GCA CGC CCC TCC GTC CAT GAG CAC CAC AGG CCC 3273 

RALQRHHTIQNSDDAYVQLD 1111 

CGG GCC CTC CAG AGA CAC CAC ACG ATC CAG AAC AGC GAC GAT GCT TAT GTA CAG CTG GAT 3333 

NLPGMSLVAGKALSSARMSD 1131 

AAC TTG CCA GGA ATG AGT CTC GTG GCT GGG AAA GCA CTT AGC TCT GCC CGG ATG TCG GAT 3393 

AVLSQSSLMGSQQFQDGENE 1151 

GCA GTT CTC AGT CAG TCT TCG CTC ATG GGC AGC CAG CAG TTT CAG GAT GGG GAA AAT GAG 3453 

ECGASLGGHEHPDLSDGSQH 1171 

GAA TGT GGG GCA AGC CTG GGA GGT CAT GAG CAC CCA GAC CTG AGT GAT GGC AGC CAG CAT 3513 

LNSSCYPSTCITDILLSYKH 1191 
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TTA AAC TCC TCT TGC TAT CCA TCT ACG TGT ATT ACA GAC ATT CTG CTC AGC TAC AAG CAC 3573 

PEVSFSMEQAGV* 1204 
CCC GAA GTC TCC TTC AGC ATG GAG CAG GCA GGC GTG TAA 3612 

CAAGAAACAGAGAGTTTTGTGTACAGCTTGGGAATGAAAAGGTTGATTGTAAACCCACAGTATCTAGCAGCGTTGTGCC 

AAATTGCCCTTGTGTTTCTCTCCACCCAAAATATCACAGCTGCTTTCCTCACATTTGGTTCATCCGTGTGCTGTTCTTT 

TGGGTTCTGAGAGGGTTTTGCCATGTTTGCTTGTATGACCAAGTCACCAAGGAAATAAACAGGAAGGAAATCCATGTTC 

TCCATCTTTTGTGAAAGTATATTTGAGTTGGTGGTTTTTTGTTTTGTTTGGGGGTTTGTGTTTTGTTTTGTTTTTGGTA 

TGTTTTCTTCCAGAGGTGATATACTTTCTTTTTTTTCTTCCTTTCTTTTTTTTCTTTCGTTCCTTTTTTGAAACAGGAG 

AGCAAAGCAGTTAGAGTTCAGAGGCCAGCGGCCTCAGGGCCACTCCCTCCCTAGCCTTCATCAGCAGAGCACCCTCCAT 

CCCCCTGCATTGCTCTTCTGTGAAAGCAAATACTAAAGGATGCCATCCTCTGGAATCCTAATGGCAGGCAAAGGGAGAG 

AGGMGGGTGACGGCTTCTGGCACTTAGAAMCAAAAAGAACAAAAAMGAGAAACCCCCAAGCCTGGAACGCAGAGAG 

GTCTTTACTGCTGGGATCCACGGAAAACATGTCTGTCCTAGCCAAGATCATATGAAGAGTTTGGCACGGAGGCTGAGAA 

TGACCTGGCATAGATGGTTTGCCAGTTAGGATGTCTCAATTTGAGCCTTTGCTTTTGGTGGATAACTCAGCTCCCCTCT 

TGTAACCTGGAAAGTTGGTTGCCTTTATCATCCTGCTGGTTTTATCCATGGACTGAACACCCAACAGCAGTGCACTATG 

CTTTCTATGGCATCTTTCATTCTCATTTTATATTGTGCTATAAAAAGGATTGTTTCTCCATATATATATTATATATGTG 

TGTATATATATAATATAATATATGTGTATATATATATTATATATATAATATATAATATATATATTATATATATATTATA 

TATATAATATATATATAAAATATATATATATATGCTCTCCTCTTTCAGCCTCTTTGTCACAGGGAAGAAGTGTAGGAGG 

TTGCCTTGGGCCCTGCCTCTCTCCTAACCTCCTCTTCCCCACTGGGTACCCTCAGCCCCTATATTTTAATTCTTGATCA 

TGTAGAAATTGTTTTTGGTAAATGTTGATATTATTGTTATTATCATTATTAATAAATAAAGAGAAAAGGAATTTTTGTT 

TAAATGAGAAATGTTTAACCAGATTCTGTTCTATTTGAATTGTGACTTGCACCTTTTGTTCAAAGTATTTCCTTTAGGC 

ATTGTAATTGTGAACAGCTCTTACTTGTGCCAGTGACAGATGCAGTGGTCTCCTTTCCCCAGTTGAAGCAGTGCATACG 

CAGTAGCTATTATTTGTGTTATCTTTATTTCTCTTCATTGTTAGAAACCAAAGTCTTCTCTGCTGGCTGGGGCTGAGAG 

AGGGTCTGGGTTATCTCCTTCTGATCTTCAAAACAAGAGAGAGACCTTGAATACACTGACTCTTCCACCCTTTTTTTTT 

CTGGGAAAGGAGAGCAAGAGGTCCCGAGTCCCCTCCTAGTCTTTCATCCTGAATTTGCACAGAGGAAAGCGGGTGCCCG 

GCATGGCCATCCTGATGTTGCTGGCGGGATCCCCATGCACCTTGTCCTTCTCCACTGATACTGGCAGCTCGGCTCCTGG 

ACCCAAGATCCCTTGAGTGGAATTCTGCAGTGCAAGAGCCCTTCGTGGGAGCTGTCCCATGTTTCCATGGTCCCCAGTC 

TCCCCTCCACTTGGTGGGGTCACCAACTACTCACCAGAAGGGGGCTTACCAAGAAAGCCCTAAAAAGCTGTTGACTTAT 

CTGCGCTTGTTCCAACTCTTATGCCCCCAACCTGCCCTACCACCACCACGCGCTCAGCCTGATGTGTTTACATGGTACT 

GTATGTATGGGAGAGCAGACTGCACCCTCCAGCAACAACAGATGAAAGCCAGTGAGCCTACTAACCGTGCCATCTTGCA 

AACTACACTTTAAAAAAAACTCATTGCTTTGTATTGTAGTAACCAATATGTGCAGTATACGTTGAATGTATATGAACAT 

ACTTTCCTATTTCTGTTCTTTGAAAATGTCAGAAATATTTTTTTCTTTCTCATTTTATGTTGAACTAAAAAGGATTAAA 
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tccgcgagggcatcagacggcggctgattagctccggtttgcatcacccggaccgggggattagctccggtttgcatca 
cccggaccgggggattagctccggtttgcatcacccggaccgggggattagctccggtttgcatcacccggaccggggg 
attagctccggtttgcatcacccggaccgggggccgggcgcgcacgagactcgcagcggaagtggaggcggctccgcgc 
gcgtccgctgctaggacccgggcagggctggagctgggctgggatcccgagctcggcagcagcgcagcgggccggccca 

cctgctggtgccctggaggctctgagccccggcggcgcccgggcccacgcggaacgacggggcgag atg cga gcc 9 

tplaapagslsrkkrleldd 23 

acc cct ctg gct gct cct gcg ggt tcc ctg tcc agg aag aag cgg ttg gag ttg gat gac 69 

nldterpvqkrarsgpqprl 43 

aac tta gat acc gag cgt ccc gtc cag aaa cga gct cga agt ggg ccc cag ccc aga ctg 129 

ppcllplspptapdratava 63 

ccc ccc tgc ctg ttg ccc ctg agc cca cct act gct cca gat cgt gca act gct gtg gcc 189 

tasrlgpyvllepeeggray 83 

act gcc tcc cgt ctt ggg ccc tat gtc ctc ctg gag ccc gag gag ggc ggg cgg gcc tac 249 

qalhcptgteytckvypvqe 103 

cag gcc ctg cac tgc cct aca ggc act gag tat acc tgc aag gtg tac ccc gtc cag gaa 309 

alavlepyarlpphkhvarp 123 

gcc ctg gcc gtg ctg gag ccc tac gcg cgg ctg ccc ccg cac aag cat gtg gct cgg ccc 369 

tevlagtqllyafftrthgd 143 

act gag gtc ctg gct ggt acc cag ctc ctc tac gcc ttt ttc act cgg acc cat ggg gac 429 

mhslvrsrhripepeaavlf 163 

atg cac agc ctg gtg cga agc cgc cac cgt atc cct gag cct gag gct gcc gtg ctc ttc 489 

rqmatal'ahchqhglvlrdl 183 

cgc cag atg gcc acc gcc ctg gcg cac tgt cac cag cac ggt ctg gtc ctg cgt gat ctc 549 

klcrfvfadrerkklvlenl 203 

aag ctg tgt cgc ttt gtc ttc gct gac cgt gag agg aag aag ctg gtg ctg gag aac ctg 609 

edscvltgpddslwdkhacp 223 

gag gac tcc tgc gtg ctg act ggg cca gat gat tcc ctg tgg gac aag cac gcg tgc cca 669 

ayvgpeilssrasysgkaad 243 

gcc tac gtg gga cct gag ata ctc agc tca cgg gcc tca tac tcg ggc aag gca gcc gat 729 

vwslgvalftmlaghypfqd 263 

gtc tgg agc ctg ggc gtg gcg ctc ttc acc atg ctg gcc ggc cac tac ccc ttc cag gac 789 

sepvllfgkirrgayalpag 283 

tcg gag cct gtc ctg ctc ttc ggc aag atc cgc cgc ggg gcc tac gcc ttg cct gca ggc 849 

lsaparclvrcllrrepaer 303 

ctc tcg gcc cct gcc cgc tgt ctg gtt cgc tgc ctc ctt cgt cgg gag cca gct gaa cgg 909 

ltatgillhpwlrqdpmpla 323 

ctc aca gcc aca ggc atc ctc ctg cac ccc tgg ctg cga cag gac ccg atg ccc tta gct 969 
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PTRSHLWEAAQVVPDGLGLD 34 
CCA ACC CGA TCC CAT CTC TGG GAG GCT GCC CAG GTG GTC CCT GAT GGA CTG GGG CTG GAC 102 

EAREEEGDREVVLYG * 3S 
GAA GCC AGG GAA GAG GAG GGA GAC AGA GAA GTG GTT CTG TAT GGC TAG 107 

GACCACCCTACTACACGCTCAGCTGCCAACAGTGGATTGAGTTTGGGGGTAGCTCCAAGCCTTCTCCTGCCTCTGAACT 
GAGCCAAACCTTCAGTGCCTTCCAGAAGGGAGAAAGGCAGAAGCCTGTGTGGAGTGTGCTGTGTACACATCTGCTTTGT 
TCCACACACATGCAGTTCCTGCTTGGGTGCTTATCAGGTGCCAAGCCCTGTTCTCGGTGCTGGGAGTACAGCAGTGAGC 
AAAGGAGACAATATTCCCTGCTCACAGAGATGACAAACTGGCATCCTTGAGCTGACAACACTTTTCCATGACCATAGGT 
CACTGTCTACACTGGGTACACTTTGTACCAGTGTCGGCCTCCACTGATGCTGGTGCTCAGGCACCTCTGTCCAAGGACA 
ATCCCTTTCACAAACAAACCAGCTGCCTTTGTATCTTGTACCTTTTCAGAGAAAGGGAGGTATCCCTGTGCCAAAGGCT 
CCAGGCCTCTCCCCTGCAACTCAGGACCCAAGCCCAGCTCACTCTGGGAACTGTGTTCCCAGCATCTCTGTCCTCTTGA 
TTAAGAGATTCTCCTTCCAGGCCTAAGCCTGGGATTTGGGCCAGAGATAAGAATCCAAACTATGAGGCTAGTTCTTGTC 
TAACTCAAGACTGTTCTGGAATGAGGGTCCAGGCCTGTCAACCATGGGGCTTCTGACCTGAGCACCAAGGTTGAGGGAC 
AGGATTAGGCAGGGTCTGTCCTGTGGCCACCTGGAAAGTCCCAGGTGGGACTCTTCTGGGGACACTTGGGGTCCACAAT 
CCCAGGTCCATACTCTAGGTTTTGGATACCATGAGTATGTATGTTTACCTGTGCCTAATAAAGGAGAATTATGAAATAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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Sequence length 2162 

M 1 

CACGCGTCCGCCCACGCGTCCGCGCCGGTGGTGGCGGCAGCGGCGGCTGCGGGGGCACCGGGCCGCGGCGCCACC ATG 3 

AVRQALGRGLQLGRALLLRF 21 

GCG GTG CGA CAG GCG CTG GGC CGC GGC CTG CAG CTG GGT CGA GCG CTG CTG CTG CGC TTC 63 

TGKPGRAYGLGRPGPAAGCV 41 

ACG GGC AAG CCC GGC CGG GCC TAC GGC TTG GGG CGG CCG GGC CCG GCG GCG GGC TGT GTC 123 

RGERPGWAAGPGAEPRRVGL 61 

CGC GGG GAG CGT CCA GGC TGG GCC GCA GGA CCG GGC GCG GAG CCT CGC AGG GTC GGG CTC 183 

GLPNRLRFFRQSVAGLAARL 81 

GGG CTC CCT AAC CGT CTC CGC TTC TTC CGC CAG TCG GTG GCC GGG CTG GCG GCG CGG TTG 243 

QRQFVVRAWGCAGPCGRAVF 101 

CAG CGG CAG TTC GTG GTG CGG GCC TGG GGC TGC GCG GGC CCT TGC GGC CGG GCA GTC TTT 303 

LAFGLGLGLIEEKQAESRRA 121 

CTG GCC TTC GGG CTA GGG CTG GGC CTC ATC GAG GAA AAA. CAG GCG GAG AGC CGG CGG GCG 363 

VSACOEIQAIFTQKSKPGPD 141 

GTC TCG GCC TGT CAG GAG ATC CAG GCA ATT TTT ACC CAG AAA AGC AAG CCG GGG CCT GAC 423 

PLDTRRLQGFRLEEYLIGQS 161 

CCG TTG GAC ACG AGA CGC TTG CAG GGC TTT CGG CTG GAG GAG TAT CTG ATA GGG CAG TCC 483 

IGKGCSAAVYEATMPTLPQN 181 

ATT GGT AAG GGC TGC AGT GCT GCT GTG TAT GAA GCC ACC ATG CCT ACA TTG CCC CAG AAC 543 

LEVTKSTGLLPGRGPGTSAP 201 

CTG GAG GTG ACA AAG AGC ACC GGG TTG CTT CCA GGG AGA GGC CCA GGT ACC AGT GCA CCA 603 

GEGQERAPGAPAFPLAIKMM 221 

GGA GAA GGG CAG GAG CGA GCT CCG GGG GCC CCT GCC TTC CCC TTG GCC ATC AAG ATG ATG 663 

WNISAGSSSEAILNTMSQEL 241 

TGG AAC ATC TCG GCA GGT TCC TCC AGC GAA GCC ATC TTG AAC ACA ATG AGC CAG GAG CTG 723 

VPASRVALAGEYGAVTYRKS 261 

GTC CCA GCG AGC CGA GTG GCC TTG GCT GGG GAG TAT GGA GCA GTC ACT TAC AGA AAA TCC 783 

KRGPKQLAPHPNIIRVLRAF 281 

AAG AGA GGT CCC AAG CAA CTA GCC CCT CAC CCC AAC ATC ATC CGG GTT CTC CGC GCC TTC 843 

TSSVPLLPGALVDYPDVLPS 301 

ACC TCT TCC GTG CCG CTG CTG CCA GGG GCC CTG GTC GAC TAC CCT GAT GTG CTG CCC TCA 903 

RLHPEGLGHGRTLFLVMKNY 321 

CGC CTC CAC CCT GAA GGC CTG GGC CAT GGC CGG ACG CTG TTC CTC GTT ATG AAG AAC TAT 963 

PCTLRQYLCVNTPSPRLAAM 341 

CCC TGT ACC CTG CGC CAG TAC CTT TGT GTG AAC ACA CCC AGC CCC CGC CTC GCC GCC ATG 1023 

MLLQLLEGVDHLVQQGIAHR 361 

ATG CTG CTG CAG CTG CTG GAA GGC GTG GAC CAT CTG GTT CAA CAG GGC ATC GCG CAC AGA 1083 

DLKSDNILVELDPDGCPWLV 381 
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GAC CTG AM TCC GAC AAC ATC CTT GTG GAG CTG GAC CCA GAC GGC TGC CCC TGG CTG GTG 1143 

IADFGCCLADESIGLQLPFS 401 

ATC GCA GAT TTT GGC TGC TGC CTG GCT GAT GAG AGC ATC GGC CTG CAG TTG CCC TTC AGC 1203 

SWYVDRGGNGCLMAPEVSTA 421 

AGC TGG TAC GTG GAT CGG GGC GGA AAC GGC TGT CTG ATG GCC CCA GAG GTG TCC ACG GCC 1263 

RPGPRAVIDYSKADAWAVGA 441 

CGT CCT GGC CCC AGG GCA GTG ATT GAC TAC AGC AAG GCT GAT GCC TGG GCA GTG GGA GCC 1323 

IAYEIFGLVNPFYGQGKAHL 461 

ATC GCC TAT GAA ATC TTC GGG CTT GTC AAT CCC TTC TAC GGC CAG GGC AAG GCC CAC CTT 1383 

ESRSYQEAQLPALPESVPPD 481 

GAA AGC CGC AGC TAC CAA GAG GCT CAG CTA CCT GCA CTG CCC GAG TCA GTG CCT CCA GAC 1443 

VRQLVRALLQREASKRPSAR 501 

GTG AGA CAG TTG GTG AGG GCA CTG CTC CAG CGA GAG GCC AGC AAG AGA CCA TCT GCC CGA 1503 

VAANVLHLSLWGEHILALKN 521 

GTA GCC GCA AAT GTG CTT CAT CTA AGC CTC TGG GGT GAA CAT ATT CTA GCC CTG AAG AAT 1563 

LKLDKMVGWLLQQSAATLLA 541 

CTG AAG TTA GAC AAG ATG GTT GGC TGG CTC CTC CAA CAA TCG GCC GCC ACT TTG TTG GCC 1623 

NRLTEKCCVETKMKMLFLAN 561 

AAC AGG CTC ACA GAG AAG TGT TGT GTG GAA ACA AAA ATG AAG ATG CTC TTT CTG GCT AAC 1683 

LECETLCQAALLLCSWRAAL 581 

CTG GAG TGT GAA ACG CTC TGC CAG GCA GCC CTC CTC CTC TGC TCA TGG AGG GCA GCC CTG 1743 

* 582 

TGA 1746 

TGTCCCTGCATGGAGCTGGTGAATTACTAAAAGAACTTGGCATCCTCTGTGTCGTGATGGTCTGTGAATGGTGAGGGTG 

GGAGTCAGGAGACAAGACAGCGCAGAGAGGGCTGGTTAGCCGGAAAAGGCCTCGGGCTTGGCAAATGGAAGAACTTGAG 

TGAGAGTTCAGTCTGCAGTCCTGTGCTCACAGACATCCGAAAAGTGAATGGCCAAGCTGGTCTAGTAGATGAGGCTGGA 

CTGAGGAGGGGTAGGCCTGCATCCACAGAGAGGATCCAGGCCAAGGCACTGGCTGTCAGTGGCAGAGTTTGGCTGTGAC 

CTTTGCCCCTAACACGAGGAACTCG 
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Input file Fbh2193fl; Output File 2193. trans 
Sequence length 1826 

MGSSMSAATARRPVF 15 

CCACGCGTCCGAGAGG ATG GGC TCG TCC ATG TCG GCG GCC ACC GCG CGG AGG CCG GTG TTT 45 

DDKEDVNFDHFQILRAIGKG 35 

GAC GAC AAG GAG GAC GTG AAC TTC GAC CAC TTC CAG ATC CTT CGG GCC ATT GGG AAG GGC 105 

SFGKVCIVQKRDTEKMYAMK 55 

AGC TTT GGC AAG GTG TGC ATT GTG CAG AAG CGG GAC ACG GAG AAG ATG TAC GCC ATG AAG 165 

YMNKQQCIERDEVRNVFREL 75 

TAC ATG AAC AAG CAG CAG TGC ATC GAG CGC GAC GAG GTC CGC AAC GTC TTC CGG GAG CTG 225 

EILQEIEHVFLVNLWYSFQD 95 

GAG ATC CTG CAG GAG ATC GAG CAC GTC TTC CTG GTG AAC CTC TGG TAC TCC TTC CAG GAC 285 

EEDMFMVVDLLLGGDLRYHL 115 

GAG GAG GAC ATG TTC ATG GTC GTG GAC CTG CTA CTG GGC GGG GAC CTG CGC TAC CAC CTG 345 

QQNVQFSEDTVRLYICEMAL 135 

CAG CAG AAC GTG CAG TTC TCC GAG GAC ACG GTG AGG CTG TAC ATC TGC GAG ATG GCA CTG 405 

ALDYLRGQHIIHRDVKPDNI 155 

GCT CTG GAC TAC CTG CGC GGC CAG CAC ATC ATC CAC AGA GAT GTC AAG CCT GAC AAC ATT 465 

LLDERGHAHLTDFNIATIIK 175 

CTC CTG GAT GAG AGA GGA CAT GCA CAC CTG ACC GAC TTC AAC ATT GCC ACC ATC ATC AAG 525 

DGERATALAGTKPYMAPEIF 195 

GAC GGG GAG CGG GCG ACG GCA TTA GCA GGC ACC AAG CCG TAC ATG GCT CCG GAG ATC TTC 585 

HSFVNGGTGYSFEVDWWSVG 215 

CAC TCT TTT GTC AAC GGC GGG ACC GGC TAC TCC TTC GAG GTG GAC TGG TGG TCG GTG GGG 645 

VMAYELLRGWRPYDIHSSNA 235 

GTG ATG GCC TAT GAG CTG CTG CGA GGA TGG AGG CCC TAT GAC ATC CAC TCC AGC AAC GCC 705 

VESLVQLFSTVSVQYVPTWS 255 

GTG GAG TCC CTG GTG CAG CTG TTC AGC ACC GTG AGC GTC CAG TAT GTC CCC ACG TGG TCC 765 

KEMVALLRKLLTVNPEHRLS 275 

AAG GAG ATG GTG GCC TTG CTG CGG AAG CTC CTC ACT GTG AAC CCC GAG CAC CGG CTC TCC 825 

SLQDVQAAPALAGVLWDHLS 295 

AGC CTC CAG GAC GTG CAG GCA GCC CCG GCG CTG GCC GGC GTG CTG TGG GAC CAC CTG AGC 885 

EKRVEPGFVPNKGRLHCDPT 315 

GAG AAG AGG GTG GAG CCG GGC TTC GTG CCC AAC AAA GGC CGT CTG CAC TGC GAC CCC ACC 945 

FELEEMILESRPLHKKKKRL 335 

TTT GAG CTG GAG GAG ATG ATC CTG GAG TCC AGG CCC CTG CAC AAG AAG AAG AAG CGT CTG 1005 

AKNKSRDNSRDSSQSENDYL 355 

GCC AAG AAC AAG TCC CGG GAC AAC AGC AGG GAC AGC TCC CAG TCC GAG AAT GAC TAT CTT 1065 

QDCLDAIQQDFVIFNREKLK 375 

CAA GAC TGC CTC GAT GCC ATC CAG CAA GAC TTC GTG ATT TTT AAC AGA GAA AAG CTG AAG 1125 

RSQDLPREPLPAPESRDAAE 395 
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AGG AGC CAG GAC CTC CCG AGG GAG CCT CTC CCC GCC CCT GAG TCC AGG GAT GCT GCG GAG 1185 

PVEDEAERSALPMCGPICPS 415 
CCT GTG GAG GAC GAG GCG GAA CGC TCC GCC CTG CCC ATG TGC GGC CCC ATT TGC CCC TCG 1245 

A G S G * 42 
GCC GGG AGC GGC TAG 126 

GCCGGGACGCCCGTGGTCCTCACCCCTTGAGCTGCTTTGGAGACTCGGCTGCCAGAGGGAGGGCCATGGGCCGAGGCCT 

GGCATTCACGTTCCCACCCAGCCTGGCTGGCGGTGCCCACAGTGCCCCGGACACATTTCACACCTCAGGCTCGTGGTGG 

TGCAGGGGACAAGAGGCTGTGGGTGCAGGGGACACCTGTGGAGGGCATTTCCCGTGGGCCCCCGAGACCCGCCTAGATG 

GAGGAAGCGCTGCTGGGCGCCCTCTTACCGCTCACGGGGAGCTGGGGCCATGGATGGGACAGGAGTCTTTGTCCCTGCT 

CAGCCCGGAGGCTGTGCACGGCCCTCGTCACAAGGTGACCCTTGCAGCACAGGCCGCGGGTGCCCCAGGCTCGGCTCAG 

GTCTTGGAGGTCAAGGGCATGGGTTGGGGTAGTGGGTGGGGAGGTGAATGTTTTCTAGAGATTCAAACTGCTCCAGCAA 

TTTCTGTAGTTTTCACCTCTGAGAATTACAATGTGAGAACCGCTCGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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Input file Fbh2249f 1 . seq; Output File 2249. trans 15/58 
Sequence length 2870 

GTCGACCCACGCGTCCGGGTTACTTCCGGGTCGGACGGCGCTAGCTGCAGCATCGGAGTGTGGCAGTGCTGGGCTGGCC 

MQGGNSGVRKR 11 

GGCGGGCTGGGCTGCGGCCCGCGCGCGGCCGGCG ATG CAG GGG GGC AAC TCC GGG GTC CGC AAG CGC 33 

EEEGDGAGAVAAPPAIDFPA 31 

GAA GAG GAG GGC GAC GGG GCT GGG GCT GTG GCT GCG CCG CCG GCC ATC GAC TTT CCC GCC 93 

EGPDPEYDESDVPAEIQVLK 51 

GAG GGC CCG GAC CCC GAA TAT GAC GAA TCT GAT GTT CCA GCA GAA ATC CAG GTG TTA AAA 153 

EPLQQPTFPFAVANQLLLVS 71 

GAA CCC CTA CAA CAG CCA ACC TTC CCT TTT GCA GTT GCA AAC CAA CTC TTG CTG GTT TCT 213 

LLEHLSHVHEPNPLRSRQVF 91 

TTG CTG GAG CAC TTG AGC CAC GTG CAT GAA CCA AAC CCA CTT CGT TCA AGA CAG GTG TTT 273 

KLLCQTFIKMGLLSSFTCSD 111 

AAG CTA CTT TGC CAG ACG TTT ATC AAA ATG GGG CTG CTG TCT TCT TTC ACT TGT AGT GAC 333 

EFSSLRLHHNRAITHLMRSA 131 

GAG TTT AGC TCA TTG AGA CTA CAT CAC AAC AGA GCT ATT ACT CAC TTA ATG AGG TCT GCT 393 

KERVRQDPCEDISRIQKIRS 151 

AAA GAG AGA GTT CGT CAG GAT CCT TGT GAG GAT ATT TCT CGT ATC CAG AAA ATC AGA TCA 453 

REVALEAQTSRYLNEFEELA 171 

AGG GAA GTA GCC TTG GAA GCA CAA ACT TCA CGT TAC TTA AAT GAA TTT GAA GAA CTT GCC 513 

ILGKGGYGRVYKVRNKLDGQ 191 

ATC TTA GGA AAA GGT GGA TAC GGA AGA GTA TAC AAG GTC AGG AAT AAA TTA GAT GGT CAG 573 

YYAIKKILIKGATKTVCMKV 211 

TAT TAT GCA ATA AAA AAA ATC CTG ATT AAG GGT GCA ACT AAA ACA GTT TGC ATG AAG GTC 633 

LREVKVLAGLQHPNIVGYHT 231 

CTA CGG GAA GTG AAG GTG CTG GCA GGT CTT CAG CAC CCC AAT ATT GTT GGC TAT CAC ACC 693 

AWIEHVHVIOPRDRAAIELP 251 

GCG TGG ATA GAA CAT GTT CAT GTG ATT CAG CCA CGA GAC AGA GCT GCC ATT GAG TTG CCA 753 

SLEVLSDQEEDREQCGVKND 271 

TCT CTG GAA GTG CTC TCC GAC CAG GAA GAG GAC AGA GAG CAA TGT GGT GTT AAA AAT GAT 813 

ESSSSSIIFAEPTPEKEKRF 291 

GAA AGT AGC AGC TCA TCC ATT ATC TTT GCT GAG CCC ACC CCA GAA AAA GAA AAA CGC TTT 873 

GESDTENQNNKSVKYTTNLV 311 

GGA GAA TCT GAC ACT GAA AAT CAG AAT AAC AAG TCG GTG AAG TAC ACC ACC AAT TTA GTC 933 

IRESGELESTLELQENGLAG 331 

ATA AGA GAA TCT GGT GAA CTT GAG TCG ACC CTG GAG CTC CAG GAA AAT GGC TTG GCT GGT 993 

LSASSIVEQQLPLRRNSHLE 351 

TTG TCT GCC AGT TCA ATT GTG GAA CAG CAG CTG CCA CTC AGG CGT AAT TCC CAC CTA GAG 1053 

ESFTSTEESSEENVNFLGQT 371 

GAG AGT TTC ACA TCC ACC GAA GAA TCT TCC GAA GAA AAT GTC AAC TTT TTG GGT CAG ACA 1113 
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EAQYHLMLHIQMQLCELSLW 391 
GAG GCA CAG TAC CAC CTG ATG CTG CAC ATC CAG ATG CAG CTG TGT GAG CTC TCG CTG TGG 1173 

DWIVERNKRGREYVDESACP 411 
GAT TGG ATA GTC GAG AGA AAC AAG CGG GGC. CGG GAG TAT GTG GAC GAG TCT GCC TGT CCT 1233 

YVMANVAT KIFOELVEGVFY 431 
TAT GTT ATG GCC AAT GTT GCA ACA AAA ATT TTT CAA GAA TTG GTA GAA GGT GTG TTT TAC 1293 

IHNMGIVHRDLKPRNIFLHG 451 
ATA CAT AAC ATG GGA ATT GTG CAC CGA GAT CTG AAG CCA AGA AAT ATT TTT CTT CAT GGC 1353 

PDQQVKIGDFGLACTDILQK 471 
CCT GAT CAG CAA GTA AAA ATA GGA GAC TTT GGT CTG GCC TGC ACA GAC ATC CTA CAG AAG 1413 

NTDWTNRNGKRTPTHTSRVG 491 
AAC ACA GAC TGG ACC AAC AGA AAC GGG AAG AGA ACA CCA ACA CAT ACG TCC AGA GTG GGT 1473 

TCLYASPEQLEGSEYDAKSD 511 
ACT TGT CTG TAC GCT TCA CCC GAA CAG TTG GAA GGA TCT GAG TAT GAT GCC AAG TCA GAT 1533 

MYSLGVVLLELFQPFGTEME 531 
ATG TAC AGC TTG GGT GTG GTC CTG CTA GAG CTC TTT CAG CCG TTT GGA ACA GAA ATG GAG 1593 

RAEVLTGLRTGQLPESLRKR 551 
CGA GCA GAA GTT CTA ACA GGT TTA AGA ACT GGT CAG TTG CCG GAA TCC CTC CGT AAA AGG 1653 

CPVQAKYIQHLTRRNSSORP 571 
TGT CCG GTG CAA GCC AAG TAT ATC CAG CAC TTA ACG AGA AGG AAC TCA TCG CAG AGA CCA 1713 

SAIQLLQSELFQNSGN VNLT 591 
TCT GCC ATT CAG CTG CTG CAG AGT GAA CTT TTC CAA AAT TCT GGA AAT GTT AAC CTC ACC 1773 

LQMKI IEQEKEIAELKKQLN 611 
CTA CAG ATG AAG ATA ATA GAG CAA GAA AAA GAA ATT GCA GAA CTA AAG AAG CAG CTA AAC 1833 

LLSQDKGVRDDGKDGGVG* 630 
CTC CTT TCT CAA GAC AAA GGG GTG AGG GAT GAC GGA AAG GAT GGG GGC GTG GGA TGA 1890 

AAGTGGACTTAACTTTTAAGGTAGTTAACTGGAATGTAAATTTTTAATCTTTATTAGGGTATAGTTGGTACAATGCTTC 

GTTGTATTTAGTAAGCCTTTACAAGACTTGTTAAAGATGTCAGAGTGCCCCAAGCTGCCGTTCCTTCCCTTCCTGCCCC 

ACAAGCTCCTTTTCCTGAATTTCCTACCTAAATATTAACCATATGCCTAGTCTCTGAAACTAAAAACTTGGACCTCATC 

CTCAATTATTTTCTCCTTTCAACTCTGTTGACCCTCTGTCTGGTCTTCCTCTAGAAGGTTCTACCGCAGAAATTGATGT 

GTGCTCCCTGCCCTCGTCACTGCCCAAGCCCGGGCCTGCACATACTCACTGGACTGTTCCAGTTTTGACAGCTGCCAGT 

CTTCCTGCCCCTTTCACACTGCAGCTGAAGTTCATTACCTGAAGGACGCCTCATCATTTCATTCCTTGGCTCCAAACCT 

TCTGCTGCCTCTAAGATAAAAGCTCAACTTCTTAACAGTGTACAGTGTGCAACTTCCAACCTTTTTATCTGTTCTCTCC 

ACCTTCAGTTTAGCGTCATTCCAAAACCACACCCTTGCAAAGCTTTGTACTCCGCACCCCAGATGATCTCCAGGCAGCT 

CAGATCTCTTTCCTGCCTTTGCCCTGCACTGTTCCCCGGTACTTCCTCCTTTATTGTAGCACTCAGCTCCCCAGCCAAT 

CTGTACATCCCTCAGAGGCAGCGATCTGATGAATTGGTTTTTGAATCCCAGAAAGGGTCTGCCATGGAGTTGGCAGTCA 

TCACGGTAGATGGCGTATGATTTTGCTGAATTTTAAATAAAATGAAAACCATAAAAAAAAAAAAAAAGGGCGGCCGC 
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Transmembrane Segments Predicted by MEMSAT 
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Prosite Pattern Matches for 1847. 18/58 
Prosite Version: Release 12.2 of February 1995 
>PS00001-PDOC00001-ASN_GLYCOSYLATION N-qlycosylation site. 
Query: 73 NMTH 76 
Query: 374 NSSA 377 
Query: 564 NISM 567 
Query: 663 NASN 666 
Query: 674 NMTS 677 

>PS00004-PDOC00004-CAMP_PHOSPHO SITE cAMP- and cGMP-dependent protein kinase 

Query: 700 KRRS 703 " phosphorylation site. 

>PS00005-PDOC00005-PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 7 SKK 9 

Query: 264 SSK 266 

Query: 293 SRK 295 

Query: 320 SPK 322 

Query: 381 TGR 383 

Query: 442 SLK 444 

Query: 514 TDK 516 

Query: 544 SKR 546 

Query: 590 SDR 592 

Query: 593 SIK 595 

Query: 619 SPK 621 

Query: 648 SKR 650 

Query: 698 TQK 700 

Query: 722 TPK 724 

Query: 725 SVR 727 

Query: 821 TKR 823 

>PS00006-PDOC00006-CK2_PHOSPHO_SITE Casein kinase II phosphorylation site. 

Query: 7 SKKE 10 

Query: 31 SIEE 34 

Query: 270 SCLE 273 

Query: 311 SSVE 314 

Query: 464 TCVE 467 

Query: 512 SFTD 515 

Query: 544 SKRD 547 

Query: 556 SDDD 559 

Query: 577 SIME 580 

Query: 581 SPLE 584 

Query: 593 SIKE 596 

Query: 597 SSFE 600 

Query: 602 SNIE 605 

Query: 676 TSLD 679 

Query: 810 SAVE 813 

Query: 817 TIDD 820 

Query: 836 SDVD 839 

Query: 861 SYFE 864 

>M0008-PDOC00008-MYRISTYL N-myristoylation site. 
Query: 12 GGGAAT 17 
Query: 52 GQKGGK 57 
Query: 209 GQVLSL 214 
> PS00009 -PDOC000Q9-AMIDATION Amidation site. 
Query: 438 LGKR 441 

>PS00108-PDOC00100-PROTEIN_KINASE_ST Serine/Threonine protein kinases 
Query: 152 IIHRDLKPDNMLI 164 active-site signature. 

FIGURE 7B 
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Transmembrane Segments Predicted by MEMSAT 
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FIGURE 8A 



Applicant: Meyers, Rachel E., et al 
Tide: NOVEL HUMAN PROTEIN KINASES AND 
USES THEREFOR 
Attorney/Agent: Jean M. Silveri 
Docket No.: MPIOO-009P1RCPIDVIM 
Sheet 20 of 58 Sheets 

20/58 

Prosite Pattern Matches for 3695. r-i^i n-n- <-»r-» 

Prosite Version: Release 12.2 of February 1995 FKjUKE OD 

> PS 00001 - PDOCO 0 0 0 1 -ASN-GLYCOS YLAT ION N-qlycosylation site. 
Query: 316 NISV 319 
Query: 501 NRSK 504 
Query: 586 NMSS 589 
Query: 726 NCSS 729 
Query: 1173 NSSC 1176 

> PS0Q002 -PDOCQ0002-GLYCOSAMINOGLYCAN Glycosaminoglycan attachment site. 
RU Additional Rules: 

RU There must be at least two acidic amino acids (Glu or Asp) 
RU -4 relative to the serine. from -2 to 

Query: 771 SGRG 774 

> PS00004 -PDOC00004-CAMP_PHOSPHO_SITE cAMP- and cGMP-dependent protein kinase 

Query: 23 KRAT 26 phosphorylation site. 

Query: 246 KRLS 249 

Query: 360 RRHT 363 

Query: 442 RRAS 445 

Query: 504 KRHT 507 

Query: 565 RRFS 568 

> PS00005 -PDOC00005-PKC-PHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 76 TER 78 

Query: 219 SGK 221 

Query: 358 SMR 360 

Query: 503 SKR 505 

Query: 547 tyk 549 

Query: 558 ter 560 

Query: 589 SIK 591 

Query: 771 SGR 773 

Query: 856 SPR 858 

Query: 922 TER 924 

Query: 1126 SAR 1128 

Query: 1188 SYK 1190 

> PS00Q06 -PDOC00006-CK2-PHOSPHO_SITE Casein kinase II phosphorylation site. 

Query: 42 TQLD 45 

Query: 88 SGGE 91 

Query: 230 TECE 233 

Query: 479 TPVD 482 

Query: 486 SDGE 489 

Query: 547 TYKD 550 

Query: 837 TPPD 840 

Query: 920 SMTE 923 

Query: 976 SMEE 979 

Query: 1084 SVHE 1087 

> PS00008 -PDOCQ0008-MYRISTYL N-myristoylation site. 

Query: 312 GTAMNI 317 

Query: 585 GNNSSI 590 

Query: 703 GAASSS 708 

Query: 765 GTAAGS 770 

Query: 918 GQSMTE 923 

Query: 985 GAKDGF 990 

Query: 1154 GASLGG 1159 

> PS00009 -PDOC00009-AMIDATION Amidation site. 

Query: 440 LGRR 443 

> PS0ul08 -PDOC00100-PROTEIN_KINASE_ST Serine/Threonine protein kinases 
Query: 126 IVHRDLKAENLLL 138 active-site signature. 
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Transmembrane Segments Predicted by MEMSAT 
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Prosite Pattern Matches for 13302 

Prosite version: Release 12.2 of February 1995 

>PS00005|PDOC00005|PKCJPHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 14 SRK 16 

Query: 27 TER 29 

Query: 95 TCK 97 

Query: 232 SSR 234 

Query: 238 SGK240 

>PS00006|PDOC00006|CK2 J > H0SPH0_SITE Casein kinase II phosphorylation site. 

Query: 54 TAPD 57 

Query: 90 TGTE 93 

Query: 140 THGD 143 

Query: 210 TGPD 213 

Query: 215 SLWD 218 

>PS00008|PDOC00008|MYRISTYL N-myristoylation site. 
Query: 91 GTEYTC 96 
Query: 341 GLDEAR 346 



FIGURE 9 
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Transmembrane Segments Predicted by MEMSAT 
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Transmembrane segments for presumed mature peptide 
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FIGURE 10A 
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Prosite Pattern Matches for 2208 

Prosite version: Release 12.2 of February 1995 

>PS00001|PDOC00001|ASN_GLYCOSYLATION N-glycosylation site. 
Query: 223 NISA 226 

>F^00004|PDOC00004|CAMP_PHOSPHO_SiTE cAMP- and cGMP-dependent protein kinase phosphorylation site. 
Query: 496 KRPS 499 

>PS00005|PDOC00005|PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 22TGK 24 

Query: 118 SRR 120 

Query: 133TQK135 

Query: 145 TRR 147 

Query: 257 TYR 259 

Query: 261 SKR 263 

Query: 324 TLR 326 

Query: 335 SPR 337 

Query: 420 TAR422 

Query: 495 SKR 497 

Query: 499 SAR 501 

Query: 545 TEK 547 

Query: 576 SWR 578 

>PS00006|PDOC00006|CK2_PHOSPHO_SITE Casein kinase II phosphorylation site. 
Query: 228 SSSE 231 
Query: 432 SKAD435 
Query: 465 SYQE468 

>PS00007|PDOC00007|TYOHOSPHO_SITE Tyrosine kinase phosphorylation site. 
Query: 458 KAHLESRSY 466 

>PS00008|PDOC00008|MYRISTYL N-myristoylation site. 

Query: 10GLQLGR 15 

Query: 39 GCVRGE 44 

Query: 105 GLGLGL 110 

Query: 159GQSIGK164 

Query: 165 GCSAAV 170 

Query: 189 GLLPGR 194 

Query: 307GLGHGR312 

Query: 386GCCLAD391 

Query: 408 GGNGCL 413 

Query: 455GQGKAH460 

>PS00108|PDOC00100|PROTEINjaNASE_ST Serine/Threonine protein kinases active-site signature. 
Query: 358 IAHRDLKSDNILV 370 

FIGURE 10B 
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Prosite Pattern Matches for 2193 

Prosite version: Release 12.2 of February 1995 

>PS00Q0i|PDOC00001|ASN_GLYCOSYLATION N-glycosylation site. 
Query: 338 NKSR 341 
>PS00004|PDOC00004|CAMP J ) HOSPHO_SITE cAMP- 

and cGMP-dependent protein kinase phosphorylation site. 
Query: 45 KRDT 48 

>PS00005|PDOC00005|PKCJ»HOSPHO_SITE Protein kinase C phosphorylation site. 
Query: 9 TAR 11 
Query: 48 TKK 50 
Query: 125 TVR 127 
Query: 295 SEK 297 

>PS00006|PDOC00006|CK2 J>H0SPH0_SITE Casein kinase II phosphorylation site. 
Query: 92 SFQD 95 
Query: 276 SLQD 279 
Query: 348 SQSE351 

>PS00007|PDOC00007|TYP^PHOSPHO_SITE tyrosine kinase phosphorylation site. 
Query: 45 KRDTEKMY 52 
>PiOO^|PDOC00008|MYPJSTYL N-myristoylation site. 
Query: 2 GSSMSA 7 
Query: 202 GTGYSF 207 

>Pi00107|PDOC00100|PROTEINJCINASE_ATP Protein kinases ATP-binding region signature. 
Query: 32 IGKGSFGKV 40 

>PS00108|PDOC00100|PROTEINJONASE_ST Serine/Threonine protein kinases active-site signature. 
Query: 145 IIHRDVKPDNILL 157 



FIGURE 11 
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Transmembrane Segments Predicted by MEMSAT 
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FIGURE 12A 
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Prosite Pattern Matches for 2249 26/58 
Prosite version: Release 12.2 of February 1995 

>PS00001|PDOC00001|ASN_GLYCOSYLATION N-glycosylation site. 
Query: 301 NKSV 304 
Query: 566 NSSQ 569 
Query: 589 NLTL 592 

>PS00004|PDOC00004|CAMP JHOSPHO JSITE cAMP-and cGMP-dependent 

Query: 345 KRNS 348 protein kinase phosphorylation site. 

Query! 564 KRNS 567 

>PS00005IPDOC00005|PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 115 SLR 117 

Query: 130 SAK 132 

Query: 160 TSR 162 

Query: 303 SVK 305 

Query: 476 TNR 478 

Query: 487 TSR 489 

Query: 547 SLR 549 

Query: 563 TRR 565 

Query: 568 SQR 570 

>PS00006 IPDOC00006ICK2 PH0SPH0 SITE Casein kinase II phosphorylation site. 

Query: 71 SLLE 74 

Query: 108 TCSD 111 

Query: 130 SAKE 133 

Query: 257 SDQE 260 

Query: 294 SDTE 297 

Query: 320 STLE 323 

Query: 336 SIVE 339 

Query: 348 SHLE351 

Query: 355 TSTE 358 

Query: 360 SSEE 363 

Query: 389 SLWD 392 

Query: 504 SEYD 507 

Query: 528TEME531 

>PS00007IPDOC00007JTYRJ > HOSPHO_SITE tyrosine kinase phosphorylation site. 

Query: 185 RNKLDGQYY 193 

>PS00008| PDOC00008IMYRISTYL N-myristoylation site. 

Query: 3 GGNSGV 8 

Query: 102 GLLSSF 107 

Query: 190 GQYYAI 195 

Query: 202 GATKTV 207 

Query: 331 GLSASS 336 

Query: 369 GQTEAQ 374 

Query: 462 GLACTD 467 

Query: 538 GLRTGQ 543 



>PS00009| PDOC00009IAMIDATION Amidation site. 
Query: 479 NGKR 



Query: 479 NGKR 482 

>Pi00l07|PDOC00100|PROTEINJONASE_ATP Protein kinases ATP-binding region signature. 
Query: 173 LGKGGYGRV 181 

>PS00108|PDOC00100|PROTEINjaNASE_ST Serine/Threonine protein kinases active-site signature. 
Query: 437 IVHRDLKPRNIFL 449 

FIGURE 12B 
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PSORT Prediction of Protein Localization 44/58 

MITDISC: discrimination of mitochondrial targeting seq 

R content: 0 Hyd Moment (75): 8.63 

Hyd Moment (95): 1.01 G content: 1 
D/E content: 2 S/T content: 2 

Score: -7 .17 

Gavel: prediction of cleavage sites for mitochondrial preseq 

cleavage site motif not found 
NUCDISC: discrimination of nuclear localization signals 

pat4 : none 

pat7: PTQKRRS (4) at 697 
bipartite: none 

content of basic residues: 10.6% 
NLS Score: -0.13 

Final Results (k = 9/23) : 

91.3%: nuclear 
8.7%: cytoplasmic 

prediction for 18477 is nuc (k=23) 



Start 


End 


Feature 


Seq 


522 


525 


VAC: possible vacuolar targeting motif 


KLPI 



FIGURE 30 



Applicant: Meyers, Rachel E , et al 
Title: NOVEL HUMAN PROTEIN KINASES AND 
USES THEREFOR 
Attorney/ Agent: Jean M. Silveri 
Docket No.: MPIOO-009PIRCP1DVIM 
Sheet 45 of 58 Sheets 



PSORT Prediction of Protein Localization 45/58 

MITDISC: discrimination of mitochondrial targeting seq 

R content: 3 Hyd Moment (75): 4.75 

Hyd Moment (95): 7.01 G content: 1 
D/E content: 1 S/T content: 3 

Score: -1.91 

Gavel: prediction of cleavage sites for mitochondrial preseq 

R-2 motif at 28 KRL | EL 

NUCDISC: discrimination of nuclear localization signals 

pat4: RKKR (5) at 15 
pat7: PVQKRAR (3) at 30 
bipartite: none 

content of basic residues: 12.3% 
NLS Score: 0.10 

ER Membrane Retention Signals: 

XXRR-like motif in the N-terminus: RATP 

none 

Final Results (k = 9/23) : 



39.1% 
26.1% 
13.0% 
8.7% 
4.3% 
4.3% 
4.3% 



mitochondrial 
cytoplasmic 
nuclear 
peroxisomal 

extracellular, including cell wall 
Golgi 

endoplasmic reticulum 



prediction for 13302 is mit (k=23) 
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FIGURE 31 
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PSORT Prediction of Protein Localization 46/58 

MITDISC: discrimination of mitochondrial targeting seq 

R content: 7 Hyd Moment (75): 10.75 

Hyd Moment (95): 7.34 G content: 10 
D/E content: 1 S/T content: 1 

Score: ,-2.28 

Gavel: prediction of cleavage sites for mitochondrial preseq 

R-2 motif at 52 VRG | ER 

NUCDISC: discrimination of nuclear localization signals 

pat4 : none 
pat7: none 
bipartite: none 

content of basic residues: 11.4% 
NLS Score: -0.47 

ER Membrane Retention Signals: 

XXRR-like motif in the N-terminus: AVRQ 

none 

Final Results (k = 9/23) : 

52 . 2% : mitochondrial 

30 . 4% : cytoplasmic 

13.0%: nuclear 

4.3%: peroxisomal 

prediction for 2208 is mit (k=23) 
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FIGURE 32 
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PSORT Prediction of Protein Localization 47/58 

MITDISC: discrimination of mitochondrial targeting seq 

R content: 2 Hyd Moment (75): 1.80 

Hyd Moment (95): 2.79 G content: 1 
D/E content: 1 S/T content: 4 

Score: -3.43 

Gavel: prediction of cleavage sites for mitochondrial preseq 

R-2 motif at 22 RRP|VF 

NUCDISC: discrimination of nuclear localization signals 

pat 4: HKKK (3) at 329 
pat4: KKKK (5) at 330 
pat4: KKKR (5) at 331 
pat7: PLHKKKK (5) at 327 
bipartite: none 

content of basic residues: 11.9% 
NLS Score: 0.77 

Final Results (k = 9/23) : 

39.1%: nuclear 

30.4%: cytoplasmic 

17.4%: mitochondrial 

8.7%: vesicles of secretory system 

4.3%: vacuolar 

prediction for 2193 is nuc (k=23) 
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FIGURE 33 
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PSORT Prediction of Protein Localization 48/58 

MITDISC: discrimination of mitochondrial targeting seq 

R content: 2 Hyd Moment (75): 2.10 

Hyd Moment (95): 4,73 G content: 3 
D/E content: 2 S/T content: 1 

Score: -7.06 

Gavel: prediction of cleavage sites for mitochondrial preseq 

R-2 motif at 19 VRK | RE 
NUCDISC: discrimination of nuclear localization signals 

pat4 : none 

pat7: PEKEKRF (4) at 285 
pat7: PESLRKR (3) at 545 
bipartite: none 

content of basic residues: 11.8% 
NLS Score: 0.13 

Final Results (k = 9/23) : 

78.3%: nuclear 
13.0%: cytoplasmic 
8.7%: mitochondrial 

prediction for 2249 is nuc (k=23) 



Start 


End 


Feature 


Seq 


581 


618 


coiled coil 


FQNSGNVNLT . . . LNLLSQDKGV 



FIGURE 34 
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Protein Family / Domain Matches , HMMer version 2 
Searching for complete domains in PFAM 

hmmpfam - search a single seg against HMM database HMMER 2. LI (Dec 1998) 
Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) . 
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HMM file: 
Sequence file: 



/prod/ddm/seqanal/PFAM/pfam4 , 4/Pf am 
/prod/ddm/wspace/orfanal/oa-s 



script. 10785. seq 



Query: 18477 

Scores for sequence family classification (score includes all domains) : 

Model Description Score 

pkinase Eukaryotic protein kinase domain 241.3 

pkinase C Protein kinase C terminal domain 11.6 

wap WAP-type (Whey Acidic Protein) ^four-disulf -12.6 

RIOl RI01/ZK632.3/MJ0444 family -109.7 



E-value 
1.3e-68 
0.027 
8.1 

2 



Parsed for domains 
Model 
pkinase 
RIOl 
wap 

pkinase 
pkinase C 



Domain 


seq-f 


seq-t 


hmm-f 


1/2 


35 


ISO .. 


1 


1/1 


48 


208 .. 


1 




516 


551 .. 


1 


2/2 


740 


835 .. 


149 


1/1 


836 


864 


1 



hmm-t 
IT 
222 

55 
278 

31 [ 



score 
T5I77 
-109.7 
-12.6 
75.5 
11.6 



Alignments of top-scoring domains: 

pkinase: domain 1 of 2, from 35 to 180: score 161.7, E = 1.2e-44 

*->yelleklGeGsfGkVykakhktgkivAvKilkkesls lr 

+ +++++ +G+fGkVy+++ gk++AvK++kk ++ +++ +++ ++ 
18477 35 FSIVKPISRGAFGKVYLGQKG-GKLYAVKWKKADMInknmthqvQA 80 

EiqilkrlsHpNIvrllgvfedtddhlylvmEymegGdLfdylrrngpls 
E 1+ + p+Iv 1+++ + + +++ylvmEy+ gGd +++1+ g+++ 
18477 81 ERDALALSKSPFIVHLYYSLQ-SMNVYLVMEYLIGGDVKSLLHIYGYFD 129 

ekeakkialQilrGleYLHsngivHRDLKpeNILldengtvKiaDFGLAr 
e+ a+k++++++ +l+YLH++gi+HRDLKp+N L++++g++K++DFGL++ 
18477 130 EEMAVKYISEVALALDYLHRHGIIHRDLKPDNMLISNEGHIKLTDFGLSK 179 

K-* 
+ 

18477 180 V 180 

pkinase: domain 2 of 2, from 740 to 835: score 75.5, E = 7.4e-20 

*->GTpwYmmAPEvilegrgysskvDvWSlGviLyElltggplfpgadlp 
GTp+Y APE+ 1+gr ++++vD+W+lGv L+E ltg 
18477 740 GTPDYL-APEL-LLGRAHGPAVDWWALGVCLFEFLTG 774 



18477 



18477 



af tggdevdqliif vlklPf sdelpktridpleelf rikkr rlplp 

+Pf d ++ +++f+ +++++ + ++ 

775 IPPFND ETPQQVFQNILKrdipWPEGE 801 



sncSeelkdLlkkcLnkDPskRpGsatakeilnhpwf<-* 
+ +S+++++ ++ +L+ D +kR +ke++ hp f 
802 EKLSDNAQSAVEILLTIDDTKRA — GMKELKRHPLF 



835 



pkinase C: domain 1 of 1, from 836 to 864: score 11.6, E = 0.027 
*->reIdWdkLEnkeiePPFKPkiksprDtsNFD<-* 
+++dW+ L + + PF+P+ +++Dts+F+ 
18477 836 S DVDWENLQHQTM- - PFI PQPDDETDTS YFE 864 



FIGURE 35 



N 

1 
1 
1 



E-value 
1.2e-44 
2 

8.1 
7.4e-20 
0.027 
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Protein Family / Domain Matches, HMMer version 2 

Searching for complete domains in PFAM 

hmmpfam - search a single seq against HMM database 

HMMER 2.1.1 (Dec 1998} 

Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) . 



HMM file: /prod/ddm/seqanal/PFAM/pfam4.4/Pfam 
Sequence file: /tmp/orfanal. 14376. aa 



Query: 3695 

Scores for sequence family classification (score includes all domains) : 

Model Description Score E-value N 

pkinase Eukaryotic protein kinase domain 335.8 4.7e-97 1 

gla Vitamin K-dependent carboxylation/gamma-carb 4.7 8.7 1 



Parsed for domains: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 

pkinase 1/1 8 259 . . 1 278 [] 335.8 4.7e-97 

gla 1/1 964 1005 .. 1 42 [] 4.7 8.7 

Alignments of top-scoring domains: 

pkinase: domain 1 of 1, from 8 to 259: score 335.8, E = 4.7e-97 

*->yelleklGeGsfGkVykakhk . tgkivAvKilk kesls . . lr 

ye+ +++G+G+f++V++a+h+ t+ +vA+Ki+++++ ++e+l++ +r 
3695 8 YEIDRTIGKGNFAWKRATHLvTKAKVAIKIIDktqldEENLKkiFR 54 

EiqilkrlsHpNIvrllgvfedtddhlylvmEymegGdLfdylrrngpls 
E+qi+k+l+Hp+I+rl+ v+e t++ +ylv+Ey+ gG+ fd+l+++g++ 
3695 55 EVQIMKMLCHPHIIRLYQVME-TERMIYLVTEYASGGEIFDHLVAHGRMA 103 

ekeakkialQilrGleYLHsngivHRDLKpeNILldengtvKiaDFGLAr 
ekea++ ++Qi+ ++ ++H ++ivHRDLK+eN+Lld n ++KiaDFG++ 
3695 104 EKEARRKFKQIVTAVYFCHCRNIVHRDLKAENLLLDANLNIKIADFGFSN 153 
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11. . .eklttfvGTpwYmmAPEvileg.rgysskvDvWSlGviLyElltg 
1+++++ 1+t +G+p+Y APE+ eg++++++kvD+WSlGv+Ly l++g 



3695 



154 



LFtpgQLLKTWCGSPPYA-APEL-FEGkEYDGPKVDIWSLGWLYVLVCG 



201 



3695 



202 



gplfpgadlpaftggdevdqliifvlklPfsdelpktridpleelfrikk 

lPf++ ++1++1 + ++ 
ALPFDG STLQNLRARVL 



218 



3695 



219 



r . rlplpsncSeelkdLlkkcLnkDPskRpGsatakeilnhpwf <-* 
+++++P S e+ +L++ +L +DP+kR+ ++++i++h+w+ 
SgKFRI PFFMSTECEHLIRHMLVLDPNKRL — SMEQICKHKWM 



259 



FIGURE 36 
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Protein Family / Domain Matches, HMMer version 2 

Searching for complete domains in PFAM 

hmmpfam - search a single seq against HMM database 

HMMER 2.1.1 (Dec 1998) 

Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) . 



HMM file: /prod/ddm/seqanal/PFAM/pfam4.4/Pfam 
Sequence file: /prod/ ddm/wspace/orfanal/oa-script. 26048. seq 



Query: 13302 

Scores for sequence family classification (score includes all domains) : 

Model Description Score E-value N 

pkinase Eukaryotic protein kinase domain 101.0 6.3e-27 2 



Parsed for domains: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 

pkinase 1/2 141 184.. 75 118.. 28.9 6.4e-07 

pkinase 2/2 223 315 .. 151 278 .] 71.8 7.8e-19 

Alignments of top-scoring domains: 

pkinase: domain 1 of 2, from 141 to 184: score 28.9, E = 6.4e-07 

*->gGdLfdylrrngplsekeakkialQilrGleYLHsngivHRDLK<-* 
+Gd+++++r++ +++e+ea +++Q++ +1+++H +g v RDLK 

13302 141 HGDMHSLVRSRHRIPEPEAAVLFRQMATALAHCHQHGLVLRDLK 184 

pkinase: domain 2 of 2, from 223 to 315: score 71.8, E = 7.8e-19 

*->pwYmmAPEvileg. . rgysskvDvWSlGviLyElltggplfpgadlp 
p Y+ PE+ 1+++ ++ ++++DvWSlGv L+ +1 g 
13302 223 PA YV- G PE I - LS S r a S YS GKAADVWS LGVALFTMLAG 257 

aftggdevdqliifvlklPfsdelpktridpleelfrikkr.rlplpsnc 
+ Pf+d + If ++r+++ lp + 
13302 258 HYPFQD SEPVLLFGKIRRgAYALPAGL 284 

SeelkdLlkkcLnkDPskRpGsatakeilnhpwf<-* 
S +++ L++++L++ P++R+ ta+ il hpw+ 
13302 285 SAPARCLVRCLLRREPAERL — TATGILLHPWL 315 



FIGURE 37 
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Protein Family / Domain Matches, HMMer version 2 52/58 

Searching for complete domains in PFAM 

hmmpfam - search a single seq against HMM database 

HMMER 2.1.1 (Dec 1998) 

Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) . 

HMM file: /prod/ddm/seqanal/PFAM/pfam4 .4/Pfam 

Sequence file: /prod/ddm/wspace/orfanal/oa-script. 25990. seq 

Query: 2208 

Scores for sequence family classification (score includes all domains) : 

Model Description Score E-value N 

pkinase Eukaryotic protein kinase domain 93.4 8.2e-25 2 

Parsed for domains: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 

pkinase 1/2 156 174 „ 1 19 [. 6.1 1.3 

pkinase 2/2 266 501 45 270 .. 87.2 4.4e-23 

Alignments of top-scoring domains: 

pkinase: domain 1 of 2, from 156 to 174: score 6.1, E = 1.3 

*->yelleklGeGsfGkVykak<-* 
y +++ +G+G ++ Vy+a+ 
2208 156 YLIGQSIGKGCSAAVYEAT 174 



pkinase: domain 2 of 2, from 266 to 501: score 87,2, E = 4.4e-23 

*->krls . HpNIvrllgvfed tdd 

k+1 +HpNI+r+l +f+ .+ ++++ ++ +++ ++++ ++++ 
KQIApHPNIIRVLMFTSsvpllpgalvdypdvlpsrlhpeglgHGR 312 
hlylvmEymegGdLfdylrrngplsekeakkialQilrGleYLHsngivH 



2208 266 



1 Ivm ++ L++yl n+ s++ a ++ 1Q+1+G+++L +gi H 
2208 313 TLFLVMKNYPC-TLRQYLCVNT-PSPRLAAMMLLQLLEGVDHLVQQGIAH 360 

RDLKpeNILlden. . . .gtvKiaDFGLArll eklttfvGT 

RDLK++NIL-H ++++ + iaDFG + ++++++++ +++G 
2208 361 RDLKSDNILVELDpdgcPWLVIADFGCCLADesiglqlpfsSWYVDRGGN 410 

pwYmmAPEvileg rgysskvDvWSlGviLyElltggplfpgadl 

m APEv +++++ + sk+D W++G i yE++ 
2208 411 GCLM-APEV-STArpgpraVIDYSKADAWAVGAIAYEIFGL 449 

paftggdevdqliifvlklPfsdelpktridpleelfrikkr. . .rlplp 
Pf++ ++ ++1 + +++ +++ lp 

2208 450 VNPFYG QGKAHLESRSYQeaqLPALP 475 

sncSeelkdLlkkcLnkDPskRpGsatako* 
+++++++++L++ +L++ +skRp +a+ 
2208 476 ESVPPDVRQLVRALLQREASKRP — SAR 501 

FIGURE 38 
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Searching for complete domains in PFAM 

hmmpfam - search a single seq against HMM database 

HMMER 2.1.1 (Dec 1998) 

Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) 



HMM file: 
Sequence file: 



/prod/ddm/seqanal/PFAM/pfam4.4/Pfam 
/prod/ddm/wspace/orfanal/oa-script. 6295. seq 



Query: 2193 

Scores for sequence family classification (score includes all domains): 
Model Description Score 

pkinase Eukaryotic protein kinase domain 240.0 



E-value N 
3.3e-68 1 



Parsed for domains: 



Model 
pkinase 



Domain 
1/1 



seg-f seg-t 
26 278 . 



hmm-f 



hmm-t 
271 [. 



score 
240.0 



Alignments of top-scoring domains: 

pkinase: domain 1 of 1, from 26 to 278: score 240.0, E = 3.3e-68 

*->yelleklGeGsf GkVykakhk . tgkivAvKilkkesls 1 

+++1+ +G+GsfGkV ++ ++t+k +A+K ++k++ ++++ ++ + 
2193 26 FQ I LRA I GKG S FGKVC I VQKRdTEKMY AMK YMNKQQC I e r de vr n vF 72 

rEiqilkrlsHpNIvrllgvfedtddhlylvmEymegGdLfdylrrngpl 
rE++il+++ H ++v+l+++f+ +++ + +v +++ gGdL+ +l++n ++ 
2193 73 RELEILQEIEHVFLVNLWYSFQ-DEEDMFMWDLLLGGDLRYHLQQNVQF 121 

sekeakkialQilrGleYLHsngivHRDLKpeNILldengtvKiaDFGLA 
se+ ++ ++ +++ +1+YL +++i+HRD+Kp+NILlde+g+ ++DF +A 
2193 122 SEDTVRLYICEMALALDYLRGQHIIHRDVKPDNILLDERGHAHLTDFNIA 171 

rll. . .eklttfvGTpwYmmAPEvi. . .leg.rgysskvDvWSlGviLyE 
+ -H++e++t + GT +Ym APE+ ++ ++g++gys +vD+WS+Gv+ yE 
2193 172 Til kdgERAT AL AGTKP YM- APE I Fhs f VNGgTG YS FE VDWWS VGVMA YE 220 

lltggplfpgadlpaftggdevdqliifvlklPfsdelpktridpleelf 
11 g " P++ i+ +++ 

2193 221 LLRG WRPYD IHSSNAVE 237 

rikkr rlplpsncSeelkdLlkkcLnkDPskRpGsatake<-* 

++++ ++ +++S+e+ Ll+k+L+++P+ R+ + + 



E-value 
3.3e-68 



2193 238 SLVQLfstvSVQYVPTWSKEMVALLRKLLTVNPEHRL — SSLQ 278 



FIGURE 39 
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HMM file: 
Sequence file: 



/prod/ddm/seqanal/PFAM/pfam4.4/Pfam 
/prod/ddm/wspace/orfanal/oa-script. 20707. seq 



Query: 2249 

Scores for sequence family classification (score includes all domains) : 



Model Description 

pkinase Eukaryotic protein kinase domain 

filament Intermediate filament proteins 



Score 

186.3 
3.0 



E-value N 



4.9e-52 
8.7 



Parsed for domains: 



Model 

pkinase 

pkinase 

pkinase 

filament 



Domain 
1/3 
2/3 
3/3 
1/1 



seq-f 
167 
379 
564 
600 



seq-t 
239 . 
523 . 
582 . 
618 . 



hmm-f 
1 

66 
257 
57 



hmm-t 
66 [. 

183 .. 

278 .] 
75 .. 



score 
60.3 

121.9 
1.9 
3.0 



Alignments of top-scoring domains: 

pkinase: domain 1 of 3, from 167 to 239: score 60.3, E = 1.2e-15 

*->yelleklGeGsf GkVykakhk . tgkivAvKilk . kesls lr 

+e 1 +lG+G++G+Vyk+++k +g+ +A+K++ k +++ + lr 
2249 167 FEELAILGKGGYGRVYKVRNKlDGQYYAIKKILiKGATKtvcmkvLR 213 



2249 



EiqilkrlsHpNIvrllgvfedtddhK-* 
E+++1+ 1+HpNIv ++ ++ ++ h+ 
214 EVKVLAGLQHPNIVGYHTAWI-EHVHV 



239 



pkinase: domain 2 of 3, from 379 to 523: score 121.9, E = le-32 

*->lylvmEymegGdLfdylrrng plsekeakkialQ 

1++ m+++e +L+d+++++++++++ +++ ++ + a ki+ + 
2249 379 LHIQMQLCEL-SLWDWIVERNkrgreyvdesacpYVMANVATKIFQE 424 

ilrGleYlHsngivHRDLKpeNILlden.gtvKiaDFGLArll 

+++G+ Y+H++givHRDLKp+NI+l+ ++ +vKi+DFGLA+ +++++ 
2249 425 LVEGVFYIHNMGIVHRDLKPRNIFLHGPdQQVKIGDFGLACTDilqkntd 474 

eklttfvGTpwYmmAPEvilegrgysskvDvWSlGviLyEl 

+++++++++++t++vGT Y +PE leg++y+ k+D++SlGv+L El 
2249 475 wtnrngkrtPTHTSRVGTCLYA-SPEQ-LEGSEYDAKSDMYSLGWLLEL 522 



2249 523 



K-* 

+ 

F 



523 



pkinase: domain 3 of 3, from 564 to 582: score 1.9, E = 19 

*->nkDPskRpGsatakeilnhpwf<-* 
+++ s+Rp +a ++1++ f 
2249 564 RRNSSQRP — SAIQLLQSELF 582 



FIGURE 40 



E-value 
1.2e-15 
le-32 
19 
8.7 
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Relative Expression (normal Colon as Reference) 
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18477/Cyclin B1 Expression in Colon 
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FIGURE 44 



